--- permalink: /textanalysis/ keywords: fastai description: "Awesome summary" title: Text analysis toc: false branch: master badges: true comments: true categories: [text analysis, sentiment analysis, wordclouds] image: images/some_folder/your_image.png hide: true search_exclude: false metadata_key1: metadata_value1 metadata_key2: metadata_value2 nb_path: _notebooks\05_Text_Analysis.ipynb layout: notebook ---
This section regarding text analysis is divided into two parts: namely wordclouds and sentiment analysis. Both the extracted wiki pages and the character dialogoues will be used and it will be investigated how wordclouds and sentiment analysis will differ based on the two different data sets.
First, we will take a look at word clouds. As mentioned before, both the extracted wiki pages and the full series dialogoue will be investigated. We will start by generating wordclouds for characters of interest. Here, we have selected the characters: Jon Snow, Arya Stark, Bronn, Brienne of Tarth and Jaime Lannister. The first step in generating the wordclouds is to compute the term frequeny-inverse document frequency (TF-IDF) for our respective text corpus, i.e. the wiki pages and episode dialogues. For further explanation of the TF-IDF and it's computation we refer to the Explainer Notebook. It should be mentioned that we have removed all characters' names from the text corpus as these would not be very decriptive of the character in a wordcloud or during sentiment analysis.
Now, let's take a look at the generated wordclouds for the selected characters.
When comparing the generated wordclouds for the respective data sets it should be noted, that the same words are, for the most part, not present for the respective characters. This is expected as one would imagine that the text from the characters wikipedia pages are more descriptive of the character and their place in the story whereas the wordcloud from the dialogue is exactly that; their most descrriptive words according to TF-IDC used throughout the series. This would be interesting to compare with sentiment analysis which is the second part of this page.
Next, we will generate wordclouds based on the characters allegiance. This will be done by pooling the dialogoue text of characters belonging to the same allegiance together and, again, compute the respective TF-IDF score in order to generate the wordclouds. For this, we have selected the houses: Stark, Lannister, Targaryen, Greyjoy and the independent group The Night's Watch. It would be interesting to see, if the houses mottos would appear in these word clouds. The respective house mottos are:
As the Night's Watch is not a House but rather a brotherhood sworn to protect The Wall, they do not have a motto.
When looking at the wordclouds above and the respective house mottos, only the Lannisters' Hear (big, middle) are present. All the wordclouds are, however, very descriptive of the respective houses. For instance for the Night's Watch, a military order sworn to protect The Wall, words like protect, wildling and swear are present. The same can be said for House Targaryan, where the main Targaryan character, Daenerys, is married to a dothraki warlord and later in the show, is a leader of dothraki people herself.
We will now generate wordclouds based on the wiki pages' season sections. It would be interesting to see how these wordclouds change as the story unfolds. It would also be intersting to investigate whether the overall theme of the series changes during the series course and if this can be seen in the wordclouds.